Mining Multi-label Data

نویسندگان

  • Grigorios Tsoumakas
  • Ioannis Katakis
  • Ioannis P. Vlahavas
چکیده

A large body of research in supervised learning deals with the analysis of singlelabel data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such data are called multi-label. Textual data, such as documents and web pages, are frequently annotated with more than a single label. For example, a news article concerning the reactions of the Christian church to the release of the “Da Vinci Code” film can be labeled as both religion and movies. The categorization of textual data is perhaps the dominant multi-label application. Recently, the issue of learning from multi-label data has attracted significant attention from a lot of researchers, motivated from an increasing number of new applications, such as semantic annotation of images [1, 2, 3] and video [4, 5], functional genomics [6, 7, 8, 9, 10], music categorization into emotions [11, 12, 13, 14] and directed marketing [15]. Table 1 presents a variety of applications that are discussed in the literature. This chapter reviews past and recent work on the rapidly evolving research area of multi-label data mining. Section 2 defines the two major tasks in learning from multi-label data and presents a significant number of learning methods. Section 3 discusses dimensionality reduction methods for multi-label data. Sections 4 and 5 discuss two important research challenges, which, if successfully met, can significantly expand the real-world applications of multi-label learning methods: a) exploiting label structure and b) scaling up to domains with large number of labels. Section 6 introduces benchmark multi-label datasets and their statistics, while Section 7 presents the most frequently used evaluation measures for multi-label learn-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

Mining Multi-Label Data Streams Using Ensemble-Based Active Learning

Data stream classification has drawn increasing attention from the data mining community in recent years, where a large number of stream classification models were proposed. However, most existing models were merely focused on mining from single-label data streams. Mining from multi-label data streams has not been fully addressed yet. On the other hand, although some recent work touched the mul...

متن کامل

A Threshold Based Multi-Label Classification

In classification problems, a pattern may belong to one or multiple categories. It is essential to deal multi-label classification accurately and efficiently. Threshold strategies can be used for multi-label classification. We propose four schemes to compute threshold for a threshold based multi-label classification. We validate our method using multi-label text data and multi-label image data....

متن کامل

Multi-Label classification for Mining Big Data

In big data problems mining requires special handling of the problem under investigation to achieve accuracy and speed on the same time. In this research we investigate the multi-label classification problems for better accuracy in a timely fashion. Label dependencies are the biggest influencing factor on performance, directly and indirectly, and is a distinguishing factor for multi-label from ...

متن کامل

Classification on Multi-label Dataset Using Rule Mining Technique

Most recent work has been focused on associative classification technique. Most research work of classification has been done on single label data. But it is not appropriate for some real world application like scene classification, bioinformatics, and text categorization. So that here we proposed multi label classification to solve the issues arise in single label classification. That is very ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010